The Anjials of Applied Statistics 
2011, Vol. 5, No. 1, 400-426 
DOI: 10.1214/10-AOAS391 

(c) Institute of Mathematical Statistics, 2011 

HIV DYNAMICS AND NATURAL HISTORY STUDIES: JOINT 
MODELING WITH DOUBLY INTERVAL-CENSORED EVENT 
TIME AND INFREQUENT LONGITUDINAL DATA^ 

By Li Su and Joseph W. Hogan 

MRC Biostatistics Unit and Brown University 

Hepatitis C virus (HCV) coinfection has become one of tlie most 
challenging clinical situations to manage in HIV-infected patients. 
Recently the effect of HCV coinfection on HIV dynamics following 
initiation of highly active antiretroviral therapy (HAART) has drawn 
considerable attention. Post-HAART HIV dynamics are commonly 
studied in short-term clinical trials with frequent data collection de- 
sign. For example, the elimination process of plasma virus during 
treatment is closely monitored with daily assessments in viral dy- 
namics studies of AIDS clinical trials. In this article instead we use 
infrequent cohort data from long-term natural history studies and 
develop a model for characterizing post-HAART HIV dynamics and 
their associations with HCV coinfection. Specifically, we propose a 
joint model for doubly interval-censored data for the time between 
HAART initiation and viral suppression, and the longitudinal CD4 
count measurements relative to the viral suppression. Inference is ac- 
complished using a fully Bayesian approach. Doubly interval-censored 
data are modeled semiparametrically by Dirichlet process priors and 
Bayesian penalized splines are used for modeling population-level and 
individual-level mean CD4 count profiles. We use the proposed meth- 
ods and data from the HIV Epidemiology Research Study (HERS) to 
investigate the effect of HCV coinfection on the response to HAART. 

1. Introduction. 

1.1. HIV dynamics following initiation of antiviral therapy. The wide- 
spread use of highly active antiretroviral therapies (HAART) against HIV in the 
United States has resulted in reducing the burden of HIV-related morbidity 
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and mortality [Jacobson, Phair and Yamashita (2004)]. HIV dynamics fol- 
lowing HAART are usually studied in short-term clinical trials with frequent 
data collection design. For example, in viral dynamics studies of AIDS clin- 
ical trials the elimination process of plasma virus after treatment is closely 
monitored with daily measurements, which has led to a new understand- 
ing of the pathogenesis of HIV infection and provides guidance for treating 
AIDS patients and evaluating antiviral therapies [Wu (2005)]. Here in this 
article HIV dynamics refer to a two-part response to HAART: viral sup- 
pression and concurrent or subsequent immune reconstitution. In clinical 
practice, the virus is considered suppressed when plasma HIV RNA (viral 
load) is below a lower limit of detection; the degree of immune reconstitution 
is commonly measured by the change of CD4-|- lymphocyte cell count (CD4 
count) after HAART initiation. 

It is well known that CD4-I- lymphocyte cells are targets of HIV and their 
abundance declines after HIV infection. Investigators have studied the asso- 
ciation between viral load and CD4 count during HAART treatment and, in 
general, they are negatively correlated [Lederman et al. (1998); Liang, Wu 
and Carroll (2003)]. Longitudinal data on these markers have been analyzed 
separately, particularly by using random-effects models. Recently, bivariate 
linear mixed models were proposed to jointly model viral load and CD4 
count by incorporating correlated random effects. These models were spec- 
ified in terms of concurrent association between viral load and CD4 count 
[Thiebaut et al. (2005); Pantazis et al. (2005)]. However, a natural time 
ordering for virologic and immunologic response to HAART (or any antivi- 
ral therapy) is often observed: when a patient begins a successful HAART 
regimen, viral replication is usually inhibited first, leading to a decrease in 
viral load; then, CD4 count often increases as the immune system begins to 
recover. Consequently, increase in CD4 count is thought to depend on the 
degree of viral suppression; it may be slower to respond than viral load or 
it may not increase at all if the virus is not suppressed [Jacobson, Phair 
and Yamashita (2004)]. Therefore, it would be advantageous to acknowl- 
edge these common sequential changes of viral load and CD4 count when 
modeling post-HAART HIV dynamics. 

1.2. Coinfection with Hepatitis C virus and HIV dynamics. Hepatitis 
C virus (HCV) coinfection is estimated to occur in 30% of HIV-infected 
patients in the United States and has become one of the most challeng- 
ing clinical situations to manage in HIV-infected patients [Sherman et al. 
(2002)]. Several studies have suggested that HCV serostatus is not associ- 
ated with the virologic response to HAART [Greub et al. (2000); Rockstroh 
et al. (2005)]. However, the evidence for immunologic response is conflicting. 
Some studies have shown that HIV-HCV coinfected patients have a blunted 
immunologic response to HAART, compared to those with HIV infection 
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alone, although others have found comparable degrees of immune reconsti- 
tution in persons with HIV-HCV coinfection [Miller et al. (2005); Stebbing 
et al. (2005); Rockstroh (2006); Sullivan et al. (2006)]. A primary motivation 
of our model is to investigate the effect of HCV coinfection on post-HAART 
HIV dynamics using cohort data from natural history studies. We focus on 
two important questions: (1) Do HCV- negative patients have shorter time 
from HAART initiation to viral suppression? (2) Do HCV-negative patients 
have better immune reconstitution at the time of viral suppression? Note 
that in the second question the sequential nature of the virologic and im- 
munologic response to HAART is emphasized. 

1.3. HIV natural history studies and the HERS. Because the incidence 
of clinical progression to AIDS fell rapidly following the widespread intro- 
duction of HAART in 1997, long-term clinical trials in patients with HIV 
become time-consuming and expensive [Mocroft et al. (2006)]. Currently, 
natural history studies are the major source of knowledge about the HIV 
epidemic and the full treatment effect of HAART over the long term. For ex- 
ample, studies such as Multicenter AIDS Cohort Study (MACS), Women's 
Interagency HIV Study (WIHS) and Swiss HIV Cohort Study (SHCS) have 
played important roles in understanding the science of HIV, the AIDS epi- 
demic and the effects of therapy [Kaslow et al. (1987); Ledergerber et al. 
(1994); Barkan et al. (1998)]. In HIV natural history studies, HIV viral 
load and CD4 count are usually measured with wide intervals (e.g., every 6 
months approximately). Therefore, for some event time of scientific interest, 
for example, the time between HAART initiation and viral suppression, both 
the time origin (HAART initiation) and the failure event (viral suppression) 
could be interval-censored. This situation is referred to as 'doubly interval- 
censored data' in the literature. In fact, the statistical research on doubly 
interval-censored data was primarily motivated by scientific questions in HIV 
research, for example, modeling 'AIDS incubation time' between HIV infec- 
tion and the onset of AIDS [De Gruttola and Lagakos (1989); Sun (2006)]. 
Both nonparametric and semiparametric methods have been proposed for 
the estimation of the distribution function of the AIDS incubation time and 
its regression analysis. A comprehensive review on the analysis of doubly 
interval-censored data can be found in Sun (2006). 

The HIV Epidemiology Research Study (HERS) is a multi-site longitudi- 
nal cohort study of HIV natural history in women between 1993 and 2001 
[Smith et al. (1997)]. At baseline between 1993 and 1995 the study enrolled 
871 HIV-seropositive women and 439 HIV-seronegative women at high risk 
for HIV infection. Participants were scheduled for approximately a 6-year 
follow-up, where a variety of clinical, behavioral and sociologic outcomes 
were recorded approximately every 6 months and measurements correspond 
to dates. The top part of Table 1 gives selected baseline characteristics of 
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Table 1 

Selected characteristics of the 1310 HERS women (top) and the 374 HERS women 
included in the analysis (bottom) in Section 4 



HIV-positive 
(AT = 871) 



HIV-negative 
(AT = 439) 



Median age at enrollment 


35.0 


34.5 


Age range at enrollment 


16.4-55.2 


16.6- 


Injection drug user at enrollment (%) 


25.1 


26.4 


CD4 count at enrollment (%) 






<200 


17.1 


0.0 


200-499 


50.7 


1.7 


>500 


32.2 


98.3 


HCV antibody test at enrollment (%) 






Positive 


60.3 


47.8 


Negative 


38.8 


50.8 


Missing 


0.9 


1.4 



Median follow-up time (months) 

Median age at enrollment 

Age range at enrollment 

Injection drug user at enrollment (%) 

Ever on antiviral treatment before 1996 (%) 

CD4 count before first reported HAART use (%) 

<200 

200-499 

>500 



HCV-positive 
(iV = 208) 

67.3 
36.7 

21.2-55.0 

29.8 

57.2 

34.6 
52.9 
12.5 



HCV-negative 
(iV = 166) 

71.0 
33.1 

19.0-55.2 
2.4 
62.1 

36.8 
45.8 
17.5 



the 1310 study participants; more details can be found in Smith et al. (1997). 
Quantification of HIV RNA viral load was performed using a branched-DNA 
(B-DNA) signal amplification assay with the detection limit at 50 copies / ml 
and flow cytometry from whole blood was used to determine CD4 counts 
at each visit. All participants were HAART-naive at baseline. During the 
study 382 participants reported HAART use based on information gathered 
during in-person interviews. Because assessments were scheduled to be car- 
ried out every 6 months and participants were only asked about whether 
they were on HAART during the last 6 months, exact dates for HAART 
initiation are not available. The analysis in Section 4 includes 374 women 
with HAART use who had HIV sero-conversion before baseline and baseline 
HCV coinfection information. Some characteristics of these 374 women are 
presented at the bottom of Table 1. 

Figure 1 shows smoothing spline fits and the corresponding derivative 
(change rate) curves for average CD4 count and the prevalence of detectable 
viral load for the 374 HERS women, where the measurement times are cente- 
red such that time represents the earliest visit with HAART information 
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Fig. 1. Top panels: smoothing spline fit and the corresponding derivative (change rate) 
curve for average CD4 count since reported HAART initiation in the HERS cohort; bottom 
panels: smoothing spline fit and the corresponding derivative (change rate) curve for the 
prevalence of detectable viral load (>50 copies/ml) since reported HAART initiation in 
the HERS cohort; solid lines: smoothing spline fits; dashed line: derivative curves of the 
smoothing spline fits; black dots: maximum of the increasing rate for average CD4 count 
and maximum of the decreasing rate of viral load prevalence. 

reported. The left panels indicate that the increasing trend for average CD4 
count started later than the decreasing trend for viral load prevalence, but 
this phenomenon is probably not related to HAART because the starting 
times for these trends are 1-2 years before the reported HAART initiation 
time. It might be more useful to examine the change rates for average CD4 
count and viral load prevalence to assess the effectiveness of HAART. In fact, 
the right panels of Figure 1 show that the maximum decreasing rate for viral 
load prevalence occurred earlier (around 4 months before reported HAART 
initiation) than the maximum increasing rate for average CD4 count (around 
the reported HAART initiation), which suggests the possible sequential re- 
lationship in post-HAART HIV dynamics discussed in Section 1.1. 



1.4. Modeling post-HAART HIV dynamics in the HERS. Our objective 
is to develop a model for the joint distribution of the time from HAART 



L. SU AND J. W. HOGAN 




1000 2000 



~i r 

1000 2000 





2000 



1000 2000 




1000 2000 1000 2000 1000 2000 



time since enrollment in days 

Fig. 2. CD4 counts (on square root scale) and censoring intervals for 9 selected HERS 
women; dotted line: censoring intervals for HAART initiation; solid line: censoring inter- 
vals for viral suppression following HAART; circles represent the data from HCV-positive 
group and triangles represent the data from HCV-negative group. 

initiation to viral suppression, and the longitudinal CD4 counts relative to 
the viral suppression time following HAART. As discussed in Section 1.3, 
the time from HAART initiation to viral suppression is doubly interval- 
censored. Specifically, considering the reporting bias for HAART initiation, 
we define the right endpoint of its corresponding censoring interval to be the 
first visit of reported HAART use and the definition for the left endpoint is 
based on assumptions about the earliest possible time of HAART initiation 
in the HERS cohort. Further, viral suppression following HAART can be 
either interval-censored or right-censored. Details can be found in Section 4. 

Figure 2 shows CD4 counts and corresponding censoring intervals of 
HAART initiation and viral suppression following HAART for selected HERS 
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women. As seen in the top left panel of Figure 2, viral suppression after 
HAART can be right-censored due to participant dropout, death and/or 
study end. Similarly, participants could have incomplete CD4 count mea- 
surements for 12 scheduled follow-up visits. However, because we focus on 
the subpopulation of HAART users in the HERS cohort, the missingness 
rate is relatively low compared to the whole HERS population; 90.64% of 
the 374 women in our analysis had at least 10 visits. Therefore, for the 
HERS analysis in Section 4, we assume that the missing data mechanism is 
missingness at random [Little and Rubin (2002)]. Given that the parame- 
ters for modeling the missing data mechanism and the outcomes are distinct 
and they have independent priors, the missing data are then ignorable when 
making posterior inference about the outcomes. 

The remainder of the article is organized as follows. In Section 2 we specify 
the joint model for doubly interval-censored event time and longitudinal 
CD4 count data. Section 3 describes the posterior inference and gives full 
conditional distributions for Gibbs steps. We use the model to analyze the 
HERS data for investigating the HCV coinfection problem introduced in 
Section 1.2, and present the results in Section 4. The conclusion and some 
discussion are given in Section 5. 

2. A model for post-HAART HIV dynamics. 

2.1. Model under an idealized situation. Our goal is to model the joint 
distribution of the time from HAART initiation to viral suppression and 
the longitudinal CD4 counts. Figure 3 is a schematic illustration of the 
variables of interest under an idealized situation. Let t (t > 0) denote the 
time since enrollment and let H and V represent the time from enrollment 
to HAART initiation and the time from enrollment to viral suppression after 
HAART, respectively. By definition, V > H and W = V — H is the time from 



Fig. 3. A scheme of the variables of interest under an idealized situation for post-HAART 
HIV dynamics: represents enrollment, t indexes the time since enrollment, H is HAART 
initiation time, V is viral suppression time following HAART, W is the time from HAART 
initiation to viral suppression, and Y {ti) ,Y {t2) , ■ ■ ■ ,Y {tn) are CD4 count measurements 
with their expectations represented by the curve. 








H V 
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HAART initiation to viral suppression. Further, Y{ti),Y{t2), ■ ■ ■ ,Y(tn) are 
CD4 count measurements taken at time points ti < ■■■ <tn- Throughout 
this article, the time points ti < ■ ■ ■ < tn are assumed to be noninformative 
and fixed by study design. Let X denote covariates, for example, the baseline 
HCV serostatus. The joint density of W and Y{ti),Y(t2), ■ ■ ■ ,Y{tn) given 
X, H and ti, . . . , t„ can be written as 

P{w, 2/1 , ?/2, • • • , yn|X, /l, ti , . . . , t„) 

(2.1) 

= p(w|X,/i)p{yi,?/2,---,?/n|X,ti - {h + w),...,tn- {h + w)}. 

The conditioning on H is because we are not interested in the marginal 
distribution of H and the observed H = h is only used as the time origin 
for W. 

The factorization in (2.1) is based on the sequential relationship in post- 
HAART dynamics. When HAART regimen is successful in suppressing the 
virus, we are able to obtain W, the time from HAART initiation to vi- 
ral suppression. As mentioned in Section 1.1, there is a time ordering of 
virologic response and immunologic response to HAART. Because of this 
sequential relationship of virologic and immunologic response as well as the 
large between-individual heterogeneity in terms of the ability to suppress vi- 
ral replication, the time to suppression and the durability of suppression, we 
believe that the mean CD4 count profiles from different individuals are more 
comparable after realigning measurement times by their individual viral sup- 
pression times following HAART. Therefore, we assume that the distribu- 
tion of Y{ti),Y{t2), . . . , Y{tn) given X depends on H and W only through a 
change in the time origin for the measurement times ti, . . . This is simi- 
lar to curve registration, a method originated in the functional data analysis 
literature [Ramsay and Li (1998)] for dealing with the situations where the 
rigid metric of physical time for real life systems is not directly relevant to 
internal dynamics. For example, the timing variation of salient features of 
individual puberty growth curves (e.g., time of puberty growth onset, time 
of peak velocity of puberty growth) can result in the distortion of population 
growth curves [Ramsay and Silverman (2005)]. Likewise, in our case, simply 
averaging individual CD4 count profiles along the time since enrollment (t) 
or the time since HAART initiation (H) can attenuate the true population 
immunologic response profile following HAART. Because viral suppression 
is the main driving force of immune reconstitution [Jacobson, Phair and 
Yamashita (2004)], it is sensible to center the time scale at individual viral 
suppression times (V = H + W) in order to describe the trends in immune 
reconstitution at the population level. 

However, as mentioned in Section 1.3, W can be doubly interval-censored 
in HIV natural history studies, which presents a challenge in making in- 
ferences about the density in (2.1). In fact, for p{yi,y2, ■ ■ ■ ,yn\^,ti — {h + 
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w) , . . . ,tn — {h + w)} , we are faced with a situation similar to the missing 
or interval-censored covariate problem in generalized linear model literature 
[Chen et al. (2005); Calle and Gomez (2005)]. To accommodate this situ- 
ation, we will extend the semiparametric Bayesian approach in Calle and 
Gomez (2005) by modeling H and W simultaneously. Note that here we 
model the observed H only for taking into account the uncertainty in the 
time origin of W; we do not intend to make inference about the marginal dis- 
tribution for HAART initiation time, which requests the right-censored data 
from those participants who did not initiate HAART during the study. This 
is different from the AIDS incubation time problem which motivated the 
research in doubly interval-censored data, where both HIV infection time 
and AIDS incubation time are of interest and HIV infection time can be 
right-censored [De Gruttola and Lagakos (1989)]. Moreover, for the HERS 
cohort, HAART was not available before 1996; therefore, when HAART ini- 
tiation time is of scientific interest, it is not valid to use enrollment as the 
time origin because all HERS women were not at risk for HAART initiation 
between enrollment and 1996. However, for the purpose of accommodating 
uncertainty for the time origin of W^, we can still use the observed censoring 
intervals for H with enrollment as their time origin. 

In the following sections, we present the details of the proposed joint 
model for the HERS data. 

2.2. Model with doubly interval- censored data. 

2.2.1. Observed data. Recall that all HERS women were HAART-naive 
at baseline. For those who initiated HAART during follow-up, let be a 
positive random variable representing the time from enrollment to HAART 
initiation. Participants were monitored only periodically, and at each follow- 
up visit they only reported whether they were on HAART treatment since 
the last visit. Hence, the true value for H is only known to lie within an 
interval {L^,R^], where is the time of the visit preceding HAART 
initiation and is the time of the first visit at which HAART use is 
reported. 

Let V be the time from enrollment to viral suppression following HAART 
initiation. By definition, V > H. For those whose viral load has been sup- 
pressed, V is observed to be in an interval {L^ ,R^], where and are 
defined similarly as and R^ . For those whose viral load was not sup- 
pressed during follow-up, V S (L^,+oo), which corresponds to right cen- 
soring of V. Because right censoring can be treated as a special case of 
interval censoring with R^ = +oo, we simply write V E {L^ ,R^]. The time 
between HAART initiation and viral suppression is W = V — H. At a given 
value for H, {L^,R^] and {L^,R^] can overlap because virus suppres- 
sion can occur quickly after HAART but before the next visit; therefore, 
W G {max{0, -H),R^ - H]. 
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Further, we observe CD4 counts Y = (Yi, . . . , Yn)"^ at time points ii, . . . , tn, 
which can be different across individuals and X is the covariate that includes 
baseline HCV coinfection status Z, where Z G {0, 1} indicates positivity of 
HCV antibody. 

In summary, the observed data for a HAART user in the HERS cohort 
consist of the observed CD4 counts Y, the covariate X, the observation 
times ti,...,tn and the intervals {L^,R^], {L^,R^] that respectively in- 
clude HAART initiation time H and viral suppression time V. 

2.2.2. Noninformative assumption for interval-censoring. The joint den- 
sity for the above observed data and the unobserved H and W can be written 
as 

p{l" ,r" ,1^ ,r^ ,h,w,y\X,ti, . ..,tn) 

= Po{l",r",l\r^mp,{h\Xr,r'',l\ry) 

(2.2) 

xp2{w\X,h,l",r^,l^,r^) 

xps{y\X,ti-{h + w),...,tn-{h + w),l",r",l^,r^}. 

Denote the cumulative distribution function (CDF) of H given X by 
G^(/i|X;A^), and the CDF of W given X by G^{w\X;X^). The corre- 
sponding probability density functions (PDF) are g {h\X;X^) and g^{w\X; 
X^), respectively. We assume that the censoring of H and W occurs non- 
informatively [Oiler, Calle and Gomez (2004); Calle and Gomez (2005)], in 
the following sense: 

(a) {L^ , , , R^) provide no additional information about Y when H 
and W are exactly observed. That is, the conditional density of Y 
given (K, H,W,ti, . . . ,tn) and {L^ , R^ , , R^) does not depend on 
iL",R",L^,RV): 

P3{y |X, ti-{h + w),...,tn-{h + w),l",r", l^,r^} 
= P3{y|X, ti - (h + w),. . . ,tn- {h + w);0}. 

(b) The only information about H and W provided by the observed cen- 
soring intervals is that {L^ ,R^], (L^ ,R^] contain H and V = H + W, 
respectively. That is, the conditional density of H given Xand {L^,R"] 
satisfies 

pi(/i|X,/^,r^Z^,r^) 

(2.3) 

5^(/i|X;A^) 



G^(r^|X;A^)-G^(/^|X;A^) 
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which corresponds to the density of H given X truncated in {L^ , R^] . 
Similarly, the conditional density of W given X, H and {L^,R^] IS 

P2(u;|X,/i,/^,r^,/^,r^) 

(2.4) 

^ ff^(u;|X;A^) 

G^{rV - /i|X; A^) - G^{max{0, 1^ - /i)|X; A'^) ' 

the truncated density g^(ti;|X; A'^) in the interval (max(0,L^ — i^),i^^ — 
i^"]. Wedenote (2.3) by g^{h\X,l^,r^;\^) and (2.4) by {w\X,h,l^ ,r^; 
A^), where the subscript T stands for 'truncated' density. 

Given these noninformative conditions, the joint density in (2.2) can be 
simplified as 

p{l^, r^,l^, r^,h, w, y, |X, ti, . . . , t„) 

= Po{l",r^,l^,ry\X)g?{h\X,l^,r^;X^) 

(2.5) 

X5f(«;|X,/.,^^^^;A^) 

xp3{y|X,ii - (h + w),. . . ,tn- ih + w);6}. 

2.2.3. Hierarchical structure of the model. To construct the observed 
data likelihood, we index each individual's data by i = 1, . . . , N and let rii 
be the number of observations for the ith individual, (Yj, Xj, , Rf, , 
Rf , til,.. . ,tini) are observed. If we denote by the conditional dis- 

tribution of random variable A, given random variable B and parameter fi, 
we can summarize our model by a hierarchical structure from a Bayesian 
point of view: 

[Yi\Xi,Hi,Wi,tii,...,tirn;6] ~P3(y|Xi,tii -Vi,...,tin^ -Vi;6), 

mx„H„LY,RY;X'^]^G^{w\J(.iM,lY,rY;^'^), 
[i?,|X,,Lf,i?f;A^]~Gf(/i|X,,/f,rf;A^), 

[Lf,flf,Lr,i?riX.]~Po(/^,r^,/'',r^|X,;<5), 

Vi = hi+Wi, i = l,...,N, 

where P^i-), G^(-), G^{-), Pq{-) and F(-) are the corresponding distribution 
functions. Assuming the independence of the priors for 5 and (A^,A^,6'), 
the marginal distribution of the censoring intervals Pq{1^ ,r^ ,1^ ,r^\X.i;6) 
is not part of the posterior inference about {X^ ,6) because of the 
noninformative censoring conditions. Therefore, we do not need to model 
PQ{l^,r^,l^,r^\y.i;5) explicitly. 



12 



L. SU AND J. W. HOGAN 



2.2.4. Semiparametric Bayesian approach for event time distributions. 
We use a semiparametric Bayesian approach for modeling H and W . The 
CDFs and are left unspecified and not constrained to a parametric 
family. Therefore, and G^ are themselves unknown parameters, and 
Dirichlet process priors [Ferguson (1973)] are assigned. 

A Dirichlet process prior (DPP) on a nonparametric distribution G is a 
distribution on the space of all possible distributions for G [Ferguson (1973)]. 
The parameters of DPP are a parametric distribution Gq{-\\), and a positive 
scalar a. The parametric distribution Go corresponds to the prior expecta- 
tion of the distribution function G. The precision parameter a indicates how 
similar we believe the base measure Go and the nonparametric distribution 
G are. A DPP with parameters a and Go is denoted by P(aGo). 

In the HERS analysis reported in Section 4, we include baseline HCV 
status as the covariate for event time distributions. Therefore, adding non- 
parametric DPP for G^ and G^ with base measures Gq , G^ , and precision 
parameters q^, a^, the initial hierarchical model structure in (2.6) can be 
elaborated as 



[Yj|Xi,iJi,Wi,tii,---,im,;^] ~ -P3(y|Xi,tii -Vi,...,tin^ -Vi;9), 

(2.7) [Wi\X,,Hi,LY,RY]^G^{w\Z„hi,lY,rY), 
[G^(-|Z,);A^,a^]~P(a^GS^(-|Z,;A^)), 

[Hi\Xi,L^, Rf] ~ G^ (/i|Zi, /f , rf ), 
[G^(-|Z,);A^,a^]^P(a^G^(-|Z,;A^)), 
[Lf,i?f,Lr,i?r|X.]-Po(/^,r^,^^,r^|X,;<5), 
[<5,A^,A^,0]~F(5,A^,A^,0), 

Vi = hi+Wi, i = l,...,N. 



2.2.5. Model for CD4 counts. In this section we describe the model for 
CD4 counts. Recall that our objective is to characterize mean CD4 count 
profiles relative to individual viral suppression times for HCV groups, after 
adjusting for other covariates. In other words, our focus is on the parameter 
9 in P3(y|Xj,iji - {hi + Wi), . . . .tir^ - {hi + Wi); 6). Since viral suppression 
time V can be right-censored, those individuals with V less than or equal 
to the maximum follow-up time T are treated as HAART responders, while 
those with V > T are considered as nonresponders in the study period for 
comparison purpose. It is also assumed that the mean CD4 count profiles 
differ by both HAART responder groups and HCV groups; thus, different 
smooth functions are used for these subpopulations. We only realign the 
data for the HAART responder group by viral suppression times; for the 
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nonresponder group the measurement time origin is still participant enroll- 
ment. 

In addition, there are other important covariates that are possibly associ- 
ated with immunologic response to HAART besides the HCV serostatus, for 
example, the overall CD4 level before HAART initiation and baseline injec- 
tion drug use information. Specifically, let X* be a vector of other covariates 
excluding baseline HCV status Zj, and T be the maximum follow-up time 
for the study. For j = 1, . . . ,ni, we assume that the CD4 count at tij for the 
ith individual follows 

V. IX* 7 „ f _ f mi{tij-Vi) + X.*(3* + eij, if < T, 

where 

mi{t) = Zi ■ mi(t) + (1 - Zi) ■ mo{t) + 7r(t), 

Ci{t) = Zi ■ ci(t) + (1 - Zi) • co{t) + 7f (t). 

Here mi{t), mo{t), ci(t), co{t) are smooth functions describing the popula- 
tion CD4 count profiles that are specific to HCV serostatus, Ji^(t) and 7f (i) 
are individual-level smooth functions that represent random deviations from 
population profiles, f3* is the regression coefficient for X*-, and the within- 

i i d 

individual error term eij ~ ' A^(0, cr^). We assume that ej(t), Ji^(t) and 7f (i) 
are mutually independent. Detailed specification for all smooth functions can 
be found in the Appendix. Overall, mi{t), mo{t), ci(t), co{t) can be consid- 
ered as fixed effects, 7™(t), 7f(i) can be considered as random effects, and 
eij is the measurement error in the linear mixed model framework. Because 
within-subject covariance is not of direct interest in our analysis, no stochas- 
tic process is further introduced into the CD4 count model except random 
effects and measurement error. However, when within-subject covariance 
is the target of inference, stochastic processes, for example, the integrated 
Ornstein-Uhlenbeck process in Taylor, Cumberland and Sy (1994), can be 
added. 



3. Prior specification and posterior inference. Gibbs sampling can be 
used to obtain posterior samples from the full conditional posterior distri- 
butions of A^, and 6. Compared to the model with known H and W 
in (2.1), the model in (2.7) involves an extra layer in the Gibbs steps. That 
is, at each iteration, the doubly interval-censored W together with H are 
sampled from their conditional posterior distributions, which results in a 
complete data set that is used to update the posterior distributions of the 
model parameters. 

For the HERS analysis in Section 4, we assume that the prior for 6 and 
the prior for A^, A^ are independent. Normal distributions are used as 
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base measures of DPP for and G^. Different values of the precision 
parameters (Q^,a^) are used to evaluate the sensitivity in estimating 
and G^ . For the CD4 count model, standard vague priors, such as normal- 
gamma conjugate family, are used. 

Let H = W= (T^i,...,W^jv)^, = . . . , t^^J^ 

T = (Ti, . . . ,T/v)"^; L^, R^, and are the vectors of left and right 
endpoints for censoring intervals. To derive the full conditional distribution 
for model (2.7), we use the Polya urn characterization of DPP [Blackwell 
and MacQueen (1973)] and extend the ideas of Escobar (1994) and Calle and 
Gomez (2005). Specifically, we sample from [H, W, A''^, A^, 0|Yi, . . . , Yjv, 
Xi, . . . , Xtv, L^, R''^, L^, R^, T], by the iterations as follows: first, H and 
W are imputed by using corresponding conditional distributions; second, 
the parameter 9 is updated using the complete data set obtained from the 
first step and current values of the rest of parameters; last, the parame- 
ters A^,A^ are updated using distinct values of imputed H and W. De- 
tails on priors and full conditional posterior distributions are given in the 
Appendix. 

4. Data analysis. In this section we apply the joint model to the HERS 
data introduced in Section 1.3. Two different definitions are used for censor- 
ing intervals of HAART initiation and the results are compared. The first one 
is explicitly based on reported HAART use information, and we refer to them 
as 'narrow' intervals for H. Here is the first visit with reported HAART 
use; is the immediate previous visit without HAART use. There are 159 
(89 HCV seropositive, 70 HCV seronegative) patients with right-censored 
viral suppression time in this case. However, we find that some patients had 
viral suppression immediately before , which could be due to the possible 
reporting bias regarding HAART initiation. As a result, we might miss the 
true viral suppression time following HAART and artificially create some 
cases with right-censored viral suppression time (or viral suppression that 
occurred long after HAART initiation). To reduce its impact in a conserva- 
tive manner, we redefine all 374 left endpoints of HAART initiation intervals 
to be March 11th, 1996, which is the left endpoint of the censoring interval 
for the patient who was the first reporting HAART use in the HERS cohort. 
Because censoring intervals for HAART initiation are wider under this new 
definition, we refer to them as 'wide' intervals for H and here the num- 
ber of patients with right-censored viral suppression time is reduced to 141 
(78 HCV seropositive, 63 HCV seronegative). Figure 4 shows the CD4 count 
data and censoring intervals under two definitions of HAART initiation time 
intervals for two selected women in the HERS cohort. In the left panel, the 
'wide' definition for H also changes the interval for viral suppression time 
V, while in the right panel the intervals for V remain the same. 
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Fig. 4. CD4 counts (on square root scale, circles: HCV positive, triangles: HCV neg- 
ative) and censoring intervals of H and V = H + W under two definitions of HA ART 
initiation time intervals for two selected women in the HERS cohort; censoring intervals 
under 'narrow' definition are represented by dashed lines, censoring intervals under 'wide' 
definition are represented by solid lines; censoring intervals of H and V = H -\-W are on 
the top and bottom of panels, respectively. 

For CD4 counts, square-root transformation is used because it is more 
appropriate for the assumptions of Normality and homogeneous variance as 
shown by exploratory analysis. In addition to baseline HCV serostatus, two 
other covariates are included in the CD4 model: the observed CD4 count 
(scaled by 100) immediately before reported HAART initiation (pretreat- 
ment CD4 level) and the indicator of baseline injection drug use (IDU). For 
penalized splines approximating population-level smooth functions, we use 
truncated quadratic bases with 20 knots, allowing sufficient flexibility for 
capturing CD4 count changes at viral suppression times. These knots are 
placed at viral suppression times as well as at the sample quantiles of the 
realigned measurement times using midpoints of the observed censoring in- 
tervals for viral suppression. Because data for individual women are sparse 
over time and the maximum number of data points for individual women is 
15, we use truncated quadratic bases with one knot at the viral suppression 
times for estimating individual-level smooth functions. Since the first deriva- 
tives (velocities) of the population-level smooth functions can be computed 
in analytic form when truncated quadratic bases are used, we also examine 
the posterior inference for these derivatives. 

The prior specifications are as described in Section 3 and the Appendix. 
For assessing sensitivity in estimating and , precision parameters 
{a^ ,a^) of the Dirichlet process are taken to be equal to (1, 1) and (10, 10), 
which indicate different levels of faith in the prior normal base measures 
for H and W . We run two MCMC chains with 7000 iterations, the first 
2000 of which are discarded. Convergence is established graphically using 
history plots; pooled 10,000 posterior samples are then used for inference. 
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The results at both values of a , a are similar; here we present those 
with {a",a^) = (10,10). MCMC is implemented in MATLAB programs 
[The MathWorks Inc. (1997)]. 

For the purpose of modeling doubly interval-censored event time W only, 
marginal models can be used by excluding the part for CD4 counts from 
(2.7). We will compare the results from our joint model with those from 
marginal models, and investigate the possible impact of joint modeling. 

4.1. Results for virologic response to HA ART. Table 2 presents the pos- 
terior mean estimates of the percentiles of the time between HAART ini- 
tiation and viral suppression for the HAART responder group. The results 
based on 'wide' intervals for H suggest that the HCV negative group might 
have shorter time to achieve viral suppression than the HCV positive group, 
but this is not the case with 'narrow' intervals for H, where the HCV nega- 
tive group has more right skewed distribution. Further, the joint model tends 
to give smaller estimates than the marginal model. For example, in Table 2 
both location estimates and variability estimates from the joint model based 
on 'wide' intervals for H are smaller than those from the marginal model, 
which suggests that modeling CD4 counts affects the estimation for dou- 
bly interval-censored W when the information from censoring intervals is 
limited. 

Table 3 gives the estimated proportions of HAART responders with time 
between HAART initiation and viral suppression less than or equal to 90/180 
days. In both cases of 'wide' and 'narrow' intervals for H, the 95% credible 
intervals for differences between proportions by HCV groups cover zero. 

Table 2 

Percentiles (posterior mean estimates) of the time between HAART initiation and viral 

suppression (in units of days) for HAART responder group by HCV serostatus in 
marginal and joint models; 'narrow' stands for 'narrow' intervals for H, 'wide' stands 

for 'wide ' intervals for H 









5% 


25% 


50% 


75% 


95% 


'narrow' W\V <T 


Marginal 


HCV + 


15 


37 


126 


654 


1339 






HCV - 


13 


39 


118 


625 


1384 




Joint 


HCV + 


13 


28 


88 


291 


906 






HCV - 


13 


31 


82 


356 


959 


'wide' W\V <T 


Marginal 


HCV + 


3 


145 


582 


1129 


1497 






HCV - 


1 


120 


436 


1021 


1521 




Joint 


HCV + 


1 


122 


350 


793 


1232 






HCV - 


1 


91 


322 


768 


1315 
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Thus, in the HERS cohort, there is not sufficient evidence that basehne 
HCV serostatus is associated with virologic response to HA ART. This is also 
demonstrated in Figure 5, where the hazard functions of viral suppression 
are plotted over grid points of 30 days. Here the hazard is defined as p(W < 
t2\W >ti,V< T), where ti, t2 are grid points. With both 'narrow' and 
'wide' intervals for H, the hazard functions of viral suppression are generally 
similar across the HCV groups. Note that estimated proportions of HAART 
responders p{V < T) are also similar for the HCV groups in all cases. 

From Table 2, median estimates for the time between HAART initiation 
and viral suppression are approximately one year with 'wide' intervals for 
H and 3~4 months with 'narrow' intervals for H in the joint model. Com- 
pared to the clinically expected value, the estimates with 'wide' intervals 
for H might be overestimated due to the following reasons. First, data were 
collected approximately every six months in the HERS, thus the immediate 
virologic response to HAART were not available. Second, HAART informa- 
tion was self-reported and we set up the left endpoints of HAART initiation 



Table 3 

Proportions (posterior mean estimates) of HAART responders and proportions of 
HAART responders with time between HAART initiation and viral suppression less than 
90 (180 ) days by HCV serostatus from marginal and joint models m the HERS cohort; 
95% credible intervals are in square brackets; 'narrow ' stands for 'narrow ' intervals for 
H , 'wide' stands for 'wide' intervals for H 



p(y <T) p(W <9Q\V <T) p{W <im\V <T) 



Marginal HCV + 0.75 0.42 0.56 

HCV - 0.72 0.43 0.56 

Difference -0.03 0.02 -0.01 

[-0.14, 0.08] [-0.24, 0.25] [-0.12, 0.11] 

Joint HCV + 0.63 0.48 0.66 

HCV - 0.64 0.52 0.62 

Difference 0.01 0.05 -0.04 

[-0.05, 0.06] [-0.24, 0.31] [-0.14, 0.07] 

'wide' 

Marginal HCV + 0.85 0.13 0.29 

HCV - 0.78 0.22 0.33 

Difference -0.07 0.08 0.04 

[-0.22, 0.07] [-0.03, 0.19] [-0.06, 0.14] 

Joint HCV + 0.68 0.17 0.36 

HCV - 0.68 0.24 0.38 

Difference 0.01 0.07 0.01 

[-0.05, 0.07] [-0.06, 0.20] [-0.10, 0.12] 
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Fig. 5. Hazard function of viral suppression after HAART initiation by HCV serostatus 
in the HERS cohort over grid points of 30 days from the joint model; left panel: 'narrow' 
intervals for H ; right panel: 'wide' intervals for H . 

time to be March 11th, 1996 for reducing reporting bias. Consequently, cen- 
soring intervals for observed HAART initiation times are wide. Third, 38% of 
the participants had right-censored viral suppression times, which might be 
related to the adherence of HAART treatment and individual heterogene- 
ity in virologic response. However, these situations do not differ by HCV 
serostatus, thus the corresponding comparison can still be useful. 

4.2. Results for immunologic response to HAART. The results for CD4 
counts are similar under both definitions of censoring intervals for HAART 
initiation and we present those based on 'wide' intervals for H. 

4.2.1. Population estimates. We compute posterior mean estimates for 
all targets of inference. The coefficient estimate for pretreatment CD4 level is 
2.35 (95% credible interval [2.22,2.49]), which clearly indicates the positive 
association between pretreatment CD4 level and the current CD4 count, 
given baseline HCV and IDU statuses. The coefficient estimate for baseline 
IDU is —0.06 (95% credible interval [—0.80,0.64]), suggesting that baseline 
IDU status was not associated with current CD4 counts, given baseline HCV 
and pretreatment CD4 level. 

For HAART responders, mean CD4 count proffies (after accounting for 
pretreatment CD4 level and baseline IDU) are plotted in the panel (a) 
of Figure 6. We transform the estimates back to the original CD4 count 
scale for illustration purposes. The estimated CD4 count profiles of both 
HCV groups were decreasing at 3-6 years before viral suppression. CD4 
counts started to increase before HIV virus was completely suppressed (time 
point 0). This is consistent with findings from other studies, that is, CD4 
cells may increase after HAART for patients who do not fully suppress the 
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virus, because the level of viral load is decreasing [Jacobson, Pliair and Ya- 
mashita (2004)]. However, Figure 6(a), also suggests that the decreasing 
trend for HCV-negative patients ends earlier than HCV-positive patients 
when HAART started to be initiated. In addition, the average CD4 level 
after viral suppression achieved by HCV-negative patients is higher than 
HCV-positive patients. For example, at viral suppression time the difference 
of average CD4 count for HCV groups is approximately 16 (95% credible 




Fig. 6. (a) Estimated CD4 count profiles by HCV groups for HAART responders (trans- 
formed to original C'D4 count scale) m the joint model, after accounting for pretreatment 
CD4. level and baseline injection drug use: solid line, HCV-positive group; dotted line, 
HCV-negative group, (b) Difference between CD4 count profiles (m original CD4 count 
scale) in the joint model: solid line, posterior mean estimates; dotted lines, 95% point- 
wise credible bands, (c) Derivatives for CD4 count profiles by HCV groups for HAART 
responders (in square root CD4 count scale) in the joint model, after accounting for pre- 
treatment CD4 level and baseline injection drug use. (d) Difference between derivatives for 
CD4 count profiles (in square root CD4 count scale) in the joint model. The ticks at the 
top and the bottom of the panels are the HAART initiation times corresponding to the 5%, 
50% and 95% quantiles of the time between HAART initiation and viral suppression in 
Table 2: solid line, HCV-positive group; dotted line, HCV-negative group. 
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interval [—3,35]), controlling for pretreatment CD4 level and baseline IDU. 
We also plot the difference curve between mean CD4 count profiles of HCV 
groups [Figure 6(b), in original CD4 count scale]. The pointwise 95% credible 
bands are approximately above zero after CD4 counts started to increase. 
Note that the difference between point estimates of the mean CD4 counts 
at the left boundary for the time since viral suppression axis might be due 
to the small sample size and large estimation variability, which is suggested 
by the width of 95% pointwise credible bands. 

To evaluate immune reconstitution after HA ART, the rate of CD4 count 
change is a useful measure. Panel (c) of Figure 6 presents the derivative 
(velocity) curves for mean CD4 count profiles of HAART responders. For 
both HCV groups, the velocities of the average CD4 count change reach 
the maximum approximately at viral suppression times, which is sensible 
because the major driving force of immune reconstitution is viral suppression 
[Jacobson, Phair and Yamashita (2004)]. Overall, the HCV-negative group 
has slightly larger point estimates of mean CD4 count change rate leading up 
to and following viral suppression. Panel (d) of Figure 6 gives the difference 
and the corresponding 95% credible bands between derivative curves of HCV 
groups. After controlling for pretreatment CD4 level and baseline IDU, the 
rates of mean CD4 count change do not appear to be different by HCV 
serostatus in the HERS cohort. 

The left panel of Figure 7 presents the mean CD4 count profiles for 
HAART nonresponders (in original CD4 count scale) along the time since 
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Fig. 7. (a) Estimated CD4 count profiles by HCV groups for HAART nonresponders 
(transformed to original CD4 count scale) m the joint model, after accounting for pretreat- 
ment CD4 level and baseline injection drug use: solid line, HCV positive group; dotted line, 
HCV negative group, (b) Difference between CD4 count profiles (m original CD4 count 
scale) in the joint model: solid line, posterior mean estimates; dotted lines, 95% pointwise 
credible bands. 
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enrollment. Both HCV groups had the same decreasing patterns, and the 
difference curve and its 95% credible band (right panel of Figure 7) indicate 
that there is not difference in mean CD4 count levels for HCV groups in 
this nonresponder population, after adjusting for pretreatment CD4 level 
and baseline IDU. 

4.2.2. Individual estimates. The parameter estimates for individuals may 
not exactly follow the patterns of the population if the between-subject vari- 
ation is large. Data, 50 sample curves from posterior predictive distributions 
and averages of 50 sampled mean curves for nine selected HERS women in 
Section 1, are plotted in Figure 8. Compared with Figures 6 and 7, we can 
see that not only the magnitude but also the patterns are different between 
the population and individual estimated profiles. However, the model fits 
well to this representative sample of individuals. 

5. Conclusion and discussion. We proposed a joint model for doubly 
interval-censored event time and longitudinal data in HIV natural history 
studies in order to investigate the post-HAART HIV dynamics and the as- 
sociated factors. Using data from the HERS cohort, we found that HCV- 
negative and HCV-positive patients had similar virologic response, which is 
measured by the time from HAART initiation to viral suppression. Further, 
our results show that for patients with virologic response to HAART, being 
HCV seronegative is associated with higher average CD4 count level after 
viral suppression, given the same pretreatment CD4 level and baseline IDU 
status. The HCV-negative group showed slightly higher immune reconsti- 
tution level (measured by the rate of mean CD4 count change) leading up 
to and following viral suppression, however, the evidence from the HERS 
cohort is not sufficient to support the conclusion. 

Data from natural history studies have been used to evaluate the effect 
of HCV coinfection on post-HAART HIV dynamics [Greub et al. (2000); 
Sulkowski et al. (2002); Miller et al. (2005)]. However, virologic response 
and immunologic response were investigated separately and simple summary 
statistics were used for inference, for example, average CD4 count increases 
after HAART initiation by visits, hazard ratio of increasing CD4 count by at 
least 50 cells/pl in a year, etc. In contrast, our method considers the charac- 
teristics of longitudinal cohort data as well as the biological background of 
the post-HAART HIV dynamics (such as the sequential relationship between 
virologic and immunologic response); our joint modeling approach utilizes 
all available information from natural history studies and the results can be 
informative in generating hypotheses for AIDS clinical trials. 

In the HERS analysis, we considered the women with V > T as HAART 
nonresponders and examined their population mean CD4 count profiles. 
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Fig. 8. CD4 count data (on square root scale) and 50 posterior predictive sample curves 
in the joint model from 9 selected women in the HERS cohort: vertical dotted lines are 
censoring intervals for HA ART initiation (under 'wide' definition), vertical solid lines are 
censoring intervals for viral suppression; except for panels (a) and (e) with Vi > T , ticks 
at the bottom of each panel are imputed viral suppression times (vi <T); circles represent 
data from the HCV-positive group and triangles represent data from the HCV-negative 
group; solid lines are averages of 50 sampled mean curves. 

However, because the data are from a natural history study and the ob- 
served HAART initiation times vary across individuals, the observed data 
for viral suppression time actually depend on the timing of HAART initia- 
tion. Therefore, the HERS women with V > T might not be a homogenous 
group in terms of response to HAART. The definition of 'responder,' how- 
ever, does not differ by HCV status. Thus, for comparison purposes, it would 
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still be useful to examine the population mean CD4 count profiles for both 
women with V > T and women with V <T. 

Due to the sparse data, information on event times for evaluating viro- 
logic response is limited in the HERS cohort. In order to reduce possible 
reporting bias regarding HAART initiation, we use two definitions of cen- 
soring intervals for HAART initiation and investigate the impact on the 
analysis. The conclusions for HCV serostatus and post-HAART HIV dy- 
namics do not differ by the definitions. However, the actual estimates for 
time between HAART initiation and viral suppression might be larger com- 
pared to the clinical expected values due to the study design, conservative 
definition of censoring intervals, participant noncompliance, drug resistance 
and other individual heterogeneity in virologic response to HAART. As we 
are being conservative by moving left endpoints of HAART initiation time 
to the earliest possible date, another option could be a hybrid approach by 
changing censoring intervals only for those with suspicious viral suppression 
immediately before self-reported HAART initiation date. Alternatively, we 
could specify a uniform prior for the left boundary of HAART initiation 
time between the left boundaries defined in 'narrow' and 'wide' intervals to 
refiect uncertainty about true HAART initiation time. 

Besides HCV coinfection, other potential determinants or modifiers of 
post-HAART HIV dynamics include characteristics of the HAART regimen, 
prior antiviral treatment history, stage of disease at the time of HAART 
initiation (viral load level), an intact immune system and other host char- 
acteristics, such as age, race, gender and genotype [Jacobson, Phair and 
Yamashita (2004)]. For adjusting these possible factors, covariates can be 
added into the CD4 count model (2.8) similarly as for the case of pretreat- 
ment CD4 level and baseline IDU status. For doubly interval-censored data, 
one limitation of our Bayesian semiparametric approach is that sample sizes 
could be small for reliable estimation when the unique values of the covari- 
ates are large. For example, there were only 4 HERS women who were IDU 
and HCV negative at baseline. Therefore, we could not assign different DPP 
to all combinations of the covariate values when baseline IDU is included 
as a covariate. In this scenario, a parametric approach can be developed to 
adjust for additional covariates. 

We believe that the proposed joint modeling approach is methodologically 
valuable. The proposed regression spline method is simple to implement, 
and naturally incorporates the typical features of longitudinal data such as 
between-individual and within-individual variations. The proposed model 
can be extended to characterize multiple processes in disease progression 
after treatment intervention, for example, the neurocognitive response to 
HAART treatment after immune reconstitution is another process of interest 
apart from the virologic and immunologic response [Bell (2004)]. 
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APPENDIX: FULL CONDITIONAL DISTRIBUTIONS FOR GIBBS 

STEPS IN SECTION 3 

A.l. Data augmentation for event times. A value for each censored ob- 
servation, Hi, is sampled from the conditional distribution of Hi given all 
other parameters. Under a DPP this conditional distribution maintains the 
same Polya urn structure assumed a priori for Hi, . . . , Hj^. It can be shown 
that the full conditional distribution of Hi has the following form: 

[i/i I , r„ Xi , W, i} , , , , , , , 0] 

(A.l) 

~ ro • g^T{h^\Z^,v„ If, rf , A^) + r, • I{hj = hi), 

where gfj. is the truncated posterior distribution in the censoring interval 
(L^, min(ii^, Uj)]. Note that Yj does not get involved in (A.l) because 
conditioning on Vi, Yj and Hi are independent. Since Vi only provides in- 
formation on the range of Hi, gfp is simply the truncated gf , base measure 
of Hi given Zi. Furthermore, 

TQCca^ gf {hi\Zi;\")dhi, 

Ji« 

rj oc < hj < min(rf , Vi),Zj = Zi), 

and ro + Z^j^j = 1- Thus, a new value of Hi is equal either to hj with prob- 
ability Tj, or to a sampled value from the distribution gfj, with probability 
ro. Also, we assume that depending on the value of Zi, the base measure gf 
are normal distributions with distinct parameters {fif ,Tf) or [fif ,Tf). 
For Wi = Vi — Hi, the full conditional distribution follows: 

[H^il Y„ T,, X„ H, {I^,-,i / i}, L^, R^, L^, R^, A^, A^, 0] 

where 

gff^{wi\'yi,Ti,J^i,hi,lY ,rY , X^) 

(xp3{yi\'Xi,Ti - {hi + 'Wi);6)g^ {wi\Zi;X^) 

X l(max(0, if — hi) <Wi< rJ — hi) 

is the truncated posterior distribution of Wi in (max(0, lJ( — Hi), Rf — Hi] . 
Furthermore, 

rrY-hi 

qo(xa^ / p3{yi\Xi,Ti- {hi + Wi);6)g^{wi\Zi;X^)dwi, 

Jmax{0,/f-hj) 

Qj ocp3(yi|Xj,rj - {hi + Wj); 6)1 {max{0, if - hi) < Wj < - hi,Zj = Zi), 
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and qo + Qj — Thus, a new value of Wi is equal either to Wj with prob- 
ability qj, or to a sampled value from the distribution g^^ with probability 
qo, where g^^ is the full conditional distribution of W that would be obtained 
if the completely parametric hierarchical model (2.6) is used and is the 
prior distribution (base measure) for W given Z. We again assume that g^ 
are normal distributions with distinct parameters {^Y jTj^), (a*o^) ''"o^)- Be- 
cause p3{yi\X.i,Ti — [hi + Wi);6) is based on the model in (2.8), there is no 
closed form for g^ and the Metropolis step [Gelman et al. (2003)] is used 
for sampling. The integral in qo is approximated by the Gauss-Legendre 
quadrature with 20 nodes. 

A. 2. Update parameters in the CD4 count model. We use Bayesian pe- 
nalized splines [Ruppert, Wand and Carroll (2003)] with a truncated poly- 
nomial basis for approximating CD4 count profiles at both population level 
and individual level. 

Following Ruppert, Wand and Carroll (2003), mi(t), mo{t), ci(t), co(t), 
7™ (t) and jf{t) (i = 1, . . . , N) in (2.8) can be approximated by 

mi{t) = B{tfp„ mo{t) = B{tf(32, 
ci{t) = A{tfai, co{t) = A{tfa2, 
7r(i) = 0(i)^b„ 7f(t)=^(t)^a„ 

where B{t) = (1, . . . , t?, (t - i^i)^, . . . , (t - A{t) = (1, t, . . . , tf, {t - 

6)^, . . . , (t - ^kJIV, m = {l,t,..., tP, {t - ...,{t- r,K,)l)^ and 
ip{t) = {l,t,... ,tP, (t — Ci)^, (t — Ce'^)+)'^ are truncated polynomial bases; 
p > 1 is an integer and {df^ = ■ l{d > 0) . {i^i, . . . ,ukb), (6, • • • , Cx^), 
(??i, • • • ,??/<^) and {Ci,---XkJ are the corresponding knots; {Kb, Ka, K^, 
K^) are the number of knots. 
Let 

Pi = (/?1,0, • • ■ ,I3i,p+Kb)^ 1 P2 = if^2,0, ■ ■ ■,I32,p+Kb)^^ 

o^i = (ai,Oi • • • , oi^p+Ka)^^ '^2 = (0^2,0, • • • , a2,p+KA)^ ^ 
bj = (bifi, . . . , bi^p+K^)^, aj = (oj^Oi ■ ■ ■ , ai^p+x^)^, 
and Xij = tij — Vi, then the proposed model in (2.8) can be rewritten as 

^ij I j ) ) , tij 

' B{x^jfp^ + (j){xijfhi + X*/3* + eij, if < T, Zi = 1, 

^{xijfp2 + Hxijf^i + X*/3* + eij, if Vi < T, Zi = 0, 

A{tijfai + i^itijfsii + X*(3* + eij, if Vi > T, Zi = 1, 

[ A{tijfa2 + i^itijfsii + X*(3* + eij, if Vi >T,Zi = 0. 
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We use the standard prior distributions for all parameters in the CD4 count 
model as follows: (3* (xl, for s = 0, . . . ,p, p{(3i^s) oc 1, p{f32,s) oc 1, p{ai^s) oc 
1, p{a2,s) (X 1, bi,s ~ N{0,al), ai,s ~ iV(0,<'), < ~ Gamma(10-3, IQ-^) 
and ~ Gamma(10~^, 10~^); for k = 1, . . . ,Kb, /3i,p+fc ~ Af(0, cr^ J and 
/32,p+fc ~ N{0,aj^); for k = I, . . . , Ka, ai^p+k ~ N{0,al^) and a2,p+k ~ A^(0, cr^ J; 
for A: = 1,...,K0, ~ A^(0,cr^); for k = 1,...,K^, o^^p+fc ~ iV(0, a^); 

cr^^, cr^^, cr^^, cr^^' "^fe' '^a ^h follow Gamma(10~^, 10"^) distribution. Note 
that o"!^, o"^^, (7^^, (7^2 ^re smoothing parameters for the population pe- 
nalized splines; o"^ and cr^ are smoothing parameters for individual penal- 
ized splines; o"^^, o"^^ (s = 0, are variance component parameters for 
random effects. Further, we assume Cij ~ A^(0,(T^) for all observations and 
cj^ ~Gamma(10-3, 10-3). 

Thus, the parameter vector 6 includes {(3* , fSi, 132, 0.1,0.2, hi, ai) and (fi^^ , 

(7^2 , fj^j, £7^2 5 CTfo) c^a; <^6s ) '^a^ ' '^^)- Since all conditional posterior distribu- 
tions for 6 are in closed form, the Gibbs steps are straightforward. 

A.3. Update parameters for DPP base measures and . The 

parameters and A'^ are updated from their full conditional distributions: 

[A^|Yi,...,Y,v,Xi,...,X,v,T,H,W,L^,R^,L^,R^,0,A^] 

A^|Yi, . . . , Ytv, Xl, . . . , Xtv, T,H, W,L'^,R"'^,L^,R^,0, A"^] 
~ n 9^im\Z^,h,lY,rY■,X'^)f{X'^), 

where and are the subsets of indexes corresponding to the distinct 
Hi and Wi because the distinct Hi and Wi are random samples from Gq 
and Gq^, respectively [Blackwell and MacQueen (1973)]. In our case, A^ = 
{^Xi ,IjLq ,Ti ,Tq) and A^ = (/x^, ^q^, t|^, r^) for the normal base mea- 
sures; we assume /(//f , r^^, T|f ) cx (r/^r^)"^ and f{^j^,^^,Tf,T^) oc 
{ti^t^)~^ ■ The conditional posterior distributions of A^ and A^ are both 
in closed forms. 
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